Preprint #04-9 REGRESSION ANALYSIS WITH LINKED DATA

نویسندگان

  • P. Lahiri
  • Michael D. Larsen
چکیده

Record linkage, or exact matching, can be used to join together two files that contain information on the same individuals, but lack unique personal identification codes. The possibility of errors in linkage causes problems for estimating the relationships between variables on the two files. The effect is analogous to the impact of measurement error. A model of a linear regression relationship between variables in linked files is proposed. Assuming the probabilities that pairs of records are links are known, an unbiased estimator of the regression coefficients is derived. Methods for estimating the linkage probabilities by using mixture models are discussed. A consistent estimator of the covariance matrix of the proposed estimator is proposed. A bootstrap estimator is used to reflect the impact of the uncertainty in record linkage model parameters on the estimators of the regression parameters. A simulation study compares the performance of the proposed estimator and alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data

This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...

متن کامل

ar X iv : h ep - p h / 99 04 36 2 v 1 1 6 A pr 1 99 9 Preprint SSU - HEP - 99 / 04

The proton structure and proton polarizability corrections to the Lamb shift of electronic hydrogen and muonic hydrogen were evaluated on the basis of modern experimental data on deep inelastic structure functions. Numerical value of proton polarizability contribution to (2P-2S) Lamb shift is equal to 4.4 GHz.

متن کامل

The Frequencies of three Factor IX-Linked Restriction Fragment Length Polymorphisms in Iranian Patients with Hemophilia B

Background: Hemophilia B is an X-linked recessive coagulation disorder caused by factor IX deficiency.  Analysis of factor IX gene polymorphisms is considered the best approach for prenatal diagnosis and carrier detection of hemophilia B where the identification of gene mutation is not easily possible. Objective: To study the frequency of three factor IX-linked restriction fragment length polym...

متن کامل

ar X iv : h ep - e x / 04 11 06 5 v 2 2 6 Fe b 20 05 BELLE Belle Preprint 2004 - 34 KEK Preprint 2004 - 69 Observation of B + →

We report measurements of radiative B decays with Kηγ final states, using a data sample of 253 fb recorded at the Υ(4S) resonance with the Belle detector at the KEKB e+e− storage ring. We observe B+ → K+ηγ for the first time with a

متن کامل

Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis

Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004